Morphological Disambiguation by Voting Constraints 
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Abstract 

We present a constraint-based morpholog- 
ical disambiguation system in which indi- 
vidual constraints vote on matching mor- 
phological parses, and disambiguation of 
all the tokens in a sentence is performed 
at the end by selecting parses that receive 
the highest votes. This constraint applica- 
tion paradigm makes the outcome of the 
disambiguation independent of the rule se- 
quence, and hence relieves the rule devel- 
oper from worrying about potentially con- 
flicting rule sequencing. Our results for 
disambiguating Turkish indicate that using 
about 500 constraint rules and some addi- 
tional simple statistics, we can attain a re- 
call of 95-96% and a precision of 94-95% 
with about 1.01 parses per token. Our sys- 
tem is implemented in Prolog and we are 
currently investigating an efficient imple- 
mentation based on finite state transduc- 
ers. 

1 Introduction 

Automatic morphological disambiguation is an im- 
portant component in higher level analysis of natural 
language text corpora. There has been a large num- 
ber of studies in tagging and morphological disam- 
biguation using vario us techniques s uch as statisti 



1992; 


DeRose, 1988 


), constraint-based techniques 


(Karlsson et al., 1995; 


Voutilainen, 1995b 




Vouti- 


lainen, Heikkila, and Anttila, 1992; 


Voutilainen 


and Tapanainen, 1993; Ofiazer and Kuruoz, 1994; 


Ofiazer and Tiir, 1996 


and transformation-based 


techniques ( 


BriU, 1992; 


Brill, 1994; 


Brill, 1995). 



This paper presents a novel approach to constraint 
based morphological disambiguation which relieves 
the rule developer from worrying about conflicting 



rule ordering requirements. The approach depends 
on assigning votes to constraints according to their 
complexity and specificity, and then letting con- 
straints cast votes on matching parses of a given 
lexical item. This approach does not reflect the out- 
come of matching constraints to the set of morpho- 
logical parses immediately. Only after all applicable 
rules are applied to a sentence, all tokens are dis- 
ambiguated in parallel. Thus, the outcome of the 
rule applications is independent of the order of rule 
applications. R ule o rdering issue has been discussed 
by Voutilainen( |l994| ) , but he has recently indicated^ 
that insensitivity to rule ordering is n ot a pr operty 
of their system (although Voutilainen( |l995a ) states 
that it is a very desirable property) but rather is 
achieved by extensively testing and tuning the rules. 

In the following sections, we present an overview 
of the morphological disambiguation problem, high- 
lighted with examples from Turkish. We then 
present our approach and results. We finally con- 
clude with a very brief outline of our investigation 
into efficient implementations of our approach. 

2 Morphological Disambiguation 

In all languages, words are usually ambiguous in 
their parts-of-speech or other morphological fea- 
tures, and may represent lexical items of different 
syntactic categories, or morphological structures de- 
pending on the syntactic and semantic context. In 
languages like English, there are a very small number 
of possible word forms that can be generated from 
a given root word, and a small number of part-of- 
speech tags associated with a given lexical form. On 
the other hand, in languages like Turkish or Finnish 
with very productive agglutinative morphology, it 
is possible to produce thousands of forms (or even 
millions ( Hankamcr, 1989| )) from a given root word 
and the kinds of ambiguities one observes are quite 



^Voutilainen, Private communication. 



different than what is observed in languages hke En- 
ghsh. 

In Turkish, there are ambiguities of the sort 
typically found in languages like English (e.g., 
book/noun vs book/verb type). However, the ag- 
glutinative nature of the language usually helps res- 
olution of such ambiguities due to the restrictions 
on morphotactics of subsequent morphemes. On the 
other hand, this very nature introduces another kind 
of ambiguity, where a lexical form can be morpho- 
logically interpreted in many ways not usually pre- 
dictable in advance. Furthermore, Turkish allows 
very productive derivational processes and the infor- 
mation about the derivational structure of a word 



form is usually crucial for disambiguation ( Oflazer 
and Tiir, 199^ ). 



Most kinds of morphological ambiguities that we 
have observed in Turkish typically fall into one the 
following classes]^ 

1. the form is uninflected and assumes the default 
inflectional features, e.g., 

1 . taS (made of stone) 
[[CAT=ADJ] [RQOT=taS]] 

2. taS (stone) 

[ [CAT=NOUN] [ROOT=taS] 
[AGR=3SG] [PQSS=NONE] [CASE=NOM] ] 

3. taS (overflow!) 

[ [CAT=VERB] [ROOT=taS] [SENSE=POS] 
[TAM1=IMP] [AGR=2SG] ] 

2. Lexically different affixes (conveying different 
morphological features) surface the same due to 
the morphographemic context, e.g., 

1. ev+ [n] in (of the house) 
[ [CAT=NOUN] [ROOT=ev] 

[AGR=3SG] [POSS=NONE] [CASE=GEN] ] 

2. ev+in (your house) 
[ [CAT=NOUN] [ROOT=ev] 

[AGR=3SG] [PQSS=2SG] [CASE=NOM] ] 

3. The root of one of the parses is a prefix string of 
the root of the other parse, and the parse with 
the shorter root word has a suffix which surfaces 
as the rest of the longer root word, e.g.. 



^Output of the morphological analyzer is edited for 
clarity, and English glosses have been given. We have 
also provided the morpheme structure, where [. . .]s, in- 
dicate elision. Glosses are given as linear feature value 
sequences corresponding to the morphemes (which are 
not shown). The feature names are as follows: CAT-major 
category, TYPE- minor category, ROOT-main root form, AGR 
-number and person agreement, POSS - possessive agree- 
ment, CASE - surface case, CONV - conversion to the cat- 
egory following with a certain sufBx indicated by the 
argument after that, TAMl-tense, aspect, mood marker 
1, SENSE- verbal polarity. Upper cases in morphological 
output indicates one of the non-ASCII special Turkish 
characters: e.g., G denotes g, U denotes ii, etc. 



1. koyu+[u]n (your dark (thing)) 

[ [CAT=AD J] [ROOT=koyu] [CONV=NOUN=NONE] 
[AGR=3SG] [P0SS=2SG] [CASE=NOM] ] 

2. koyun (sheep) 

[ [CAT=NOUN] [RQOT=koyun] 
[AGR=3SG] [POSS=NONE] [CASE=NOM] ] 

3. koy+[n]un (of the bay) 
[[CAT=NOUN] [RQOT=koy] 

[AGR=3SG] [POSS=NQNE] [CASE=GEN] ] 

4. koy+un (your bay) 
[[CAT=NOUN] [ROOT=koy] 

[AGR=3SG] [P0SS=2SG] [CASE=NOM] ] 

5 . koy+ [y] un (put ! ) 

[ [CAT=VERB] [ROOT=koy] [SENSE=POS] 
[TAM1=IMP] [AGR=2PL] ] 

4. The roots take different numbers of unrelated 
inflectional and/or derivational suffixes which 
when concatenated turn out to have the same 
surface form, e.g., 

1. yap+madcoi (without having done (it)) 
[[CAT=VERB] [RODT=yap] [SENSE=PQS] 

[CONV=ADVERB=MADAN] ] 

2. yap+ma+dan (from doing (it)) 

[ [CAT=VERB] [ROOT=yap] [SENSE=POS] 
[CONV=NOUN=MA] [TYPE=INFINITIVE] 
[AGR=3SG] [PQSS=NONE] [CASE=ABL] ] 

5. One of the ambiguous parses is a lexicalized 
form while another is form derived by a pro- 
ductive derivation as in 1 and 2 below. 

6. The same suffix appears in different positions in 
the morphotactic paradigm conveying different 
information as in 2 and 3 below. 

1. uygulama / (application) 
[ [CAT=NOUN] [RQOT=uygulama] 

[AGR=3SG] [POSS=NQNE] [CASE=NOM] ] 

2. uygula+ma / ((the act of) applying) 

[ [CAT=VERB] [ROOT=uygula] [SENSE=PDS] 
[CONV=NQUN=MA] [TYPE=INFINITIVE] 
[AGR=3SG] [PQSS=NONE] [CASE=NQM] ] 

3 . uygula+ma / (do not apply ! ) 

[ [CAT=VERB] [ROOT=uygula] [SENSE=NEG] 
[TAM1=IMP] [AGR=2SG] ] 



The main intent of our system is to achieve mor- 
phological disambiguation by choosing for a given 
ambiguous token, the correct parse in a given con- 
text. It is certainly possible that a given token may 
have multiple correct parses, usually with the same 
inflectional features, or with inflectional features not 
ruled out by the syntactic context, but one will be 
the "correct" parse usually on semantic grounds. 

We consider a token fully disambiguated if it has 
only one morphological parse remaining after auto- 
matic disambiguation. We consider a token as cor- 
rectly disambiguated, if one of the parses remaining 



for that token is the correct intended parse. We eval- 
uate the resulting disambiguated text by a number 
of metrics defined as follows ( Voutilaincn, 1995a): 



Ambiguity ~ 



^Parses 
^Tokens 



Recall 



Precision 



^Tokens Correctly Disambiguated 
^Tokens 

^Tokens Correctly Disambiguated 
4j^Parses 



In the ideal case where each token is uniquely and 
correctly disambiguated with the correct parse, both 
recall and precision will be 1.0. On the other hand, a 
text where each token is annotated with all possible 
parses,^ the recall will be 1.0, but the precision will 
be low. The goal is to have both recall and precision 
as high as possible. 

3 Constraint-based Morphological 
Disambiguation 

This section outlines our approach to constraint- 
based morphological disambiguation where con- 
straints vote on matching parses of sequential to- 
kens. 

3.1 Constraints on morphological parses 

We describe constraints on the morphological parses 
of tokens using rules with two components 

R — (Ci, C2, • • • , C„; V) 

where the are (possibly hierarchical) feature con- 
straints on a sequence of the morphological parses, 
and V is an integer denoting the vote of the rule. 

To illustrate the flavor of our rules we can give the 
following examples: 

1. The following rule with two constraints matches 
parses with case feature ablative, preceding a 
parse matching a postposition subcategorizing 
for an ablative nominal form. 

[ [case : abl] , [cat : postp , subcat : abl] ] 

2. The rule 

[ [agr : '2SG' , case : gen] , [cat :noun,poss : ' 2SG'] ] 

matches a nominal form with a possessive 
marker 2SG, following a pronoun with 2SG 
agreement and genitive case, enforcing the sim- 
plest form of noun phrase constraints. 



3. In general constraints can make references to 
the derivational structure of the lexical form 
and hence be hierarchical. For instance, the fol- 
lowing rule is an example of a rule employing a 
hierarchical constraint: 

[ [cat : adj , stem: [taml :narr] ] , 
[cat : noun, stem: no] ] 

which matches the derived participle reading of 
a verb with narrative past tense, if it is followed 
by an underived noun parse. 

3.2 Determining the vote of a rule 

There are a number of ways votes can be assigned 
to rules. For the purposes of this work the vote of 
a rule is determined by its static properties, but it 
is certainly conceivable that votes can be assigned 
or learned by using statistics from disambiguated 
corpora.^ For static vote assignment, intuitively, we 
would like to give high votes to rules that are more 
specific: i.e., to rules that have 

• higher number of constraints, 

• higher number of features in the constraints, 

• constraints that make reference to nested stems 
(from which the current form is derived), 

• constraints that make reference to very specific 
features or values. 

Let R — (Ci, C2, • • • , C„; y) be a constraint rule. 
The vote V is determined as 



V = Y^V{C.) 

i=l 



where V{Ci) is the contribution of constraint Ci to 
the vote of the rule R. A (generic) constraint has 
the following form: 



C^[{h:vi)k{h:v2)k---{f„ 



where fi is the name of a morphological feature, and 
Vi is one of the possible values for that feature. The 
contribution of fi : Vi in the vote of a constraint 
depends on a number of factors: 

1. The value Vi may be a distinguished value that 
has a more important function in disambigua- 
tion. I In this case, the weight of the feature 
constraint is w{vi){> 1). 



Assuming no unknown words. 



*We have left this for future work. 

^For instance, for Turkish we have noted that 
the genitive case marker is usually very helpful in 
disambiguation. 



2. The feature itself may be a distinguished feature 
which has more important function in disam- 
biguation. In this case the weight of the feature 
is w{f,){> 1). 

3. If the feature fi refers to the stem of a de- 
rived form and the value part of the feature con- 
straint is a full fledged constraint C" on the stem 
structure, the weight of the feature constraint is 
found by recursively computing the vote of C 
and scaling the resulting value by a factor (2 in 
our current system) to improve its specificity. 

4. Otherwise, the weight of the feature constraint 
is 1. 

For example suppose we have the following con- 
straint: 

[cat: noun, case: gen, 

stem: [cat : adj , stem: [cat :v] , suffix=mis]] 

Assuming the value gen is a distinguished value 
with weight 4 (cf., factor 1 above), the vote of this 
constraint is computed as follows: 

1. cat : noun contributes 1, 

2. case : gen contributes 4, 

3. stem: [cat : adj , stem: [cat : v] , suffix=mis] 

contributes 8 computed as follows: 

(a) cat: adj contributes 1, 

(b) suff ix=mis contributes 1, 

(c) stem: [cat:v] contributes 2 = 2*1, the 1 
being from cat : v, 

(d) the sum 4 is scaled by 2 to give 8. 

4. Votes from steps 1, 2 and 3(d) are added up to 
give 13 as the constraint vote. 

We also employ a set of rules which express pref- 
erences among the parses of single lexical form in- 
dependent of the context in which the form occurs. 
The weights for these rules are currently manually 
determined. These rules give negative votes to the 
parses which are not preferred or high votes to cer- 
tain parses which are always preferred. Our experi- 
ence is that such preference rules depend on the kind 
of the text one is disambiguating. For instance if one 
is disambiguating a manual of some sort, imperative 
readings of verbs are certainly possible, whereas in 
normal plain text with no discourse, such readings 
are discouraged. 



3.3 Voting and selecting parses 

A rule R — (Ci, C2, • • ■ , C„; F) will match a se- 
quence of tokens Wi, Wi+i, • • ■ , Wi^n-i within a sen- 
tence wi through Ws if some morphological parse of 
every token Wj,i < j < i + n — 1 is subsumed by 
the corresponding constraint Cj_i+i. When all con- 
straints match, the votes of all the matching parses 
are incremented hy V. If a given constraint matches 
more than one parse of a token, then the votes of all 
such matching parses are incremented. 

After all rules have been applied to all token po- 
sitions in a sentence and votes are tallied, morpho- 
logical parses are selected in the following manner. 
Let vi and Vh be the votes of the lowest and high- 
est scoring parses for a given token. All parses with 
votes equal to or higher than vi + m * {vh — vi) are 
selected with m (0 < m < 1) being a parameter, 
m = 1 selects the highest scoring parse(s). 

4 Results from Disambiguating 
Turkish Text 

We have applied our approach to disambiguating 
Turkish text. Raw text is processed by a prepro- 
cessor which segments the text into sentences using 
various heuristics about punctuation, and then to- 
kenizes and runs it through a wide-coverage high- 
performance morphological analyzer developed us- 



ing two-leve l morphology tools by Xerox ( Kart 



tunen, 1993 ). The preprocessor module also per- 
forms a number of additional functions such as 
grouping of lexicalized and n on-lexicalized colloca- 
tions , compound verbs, etc. , ( Oflazcr and Kuruoz 



1994; Oflazer and Tiir, 1996). The preprocessor also 



uses a second morphological processor for dealing 
with unknown words which recovers any derivational 
and infiectional information from a word even if the 
root word is not known. This unknown word pro- 
cessor has a (nominal) root lexicon which recognizes 
S^, where S is the Turkish surface alphabet (in the 
two-level morphology sense), but then tries to in- 
terpret an arbitrary postfix string of the unknown 
word, as a sequence of Turkish suffixes subject to 



all morphographemic constraints (Oflazer and Tiir 



1996) 



We have applied our approach to four texts la- 
beled ARK, HIST, MAN, EMB, with statistics 
given m Table 0. The tokens considered are those 
that are generated after morphological analysis, un- 
known word processing and any lexical coalescing is 
done. The words that are counted as unknown are 
those that could not even be processed by the un- 
known noun processor as they violate Turkish mor- 
phographemic constraints. Whenever an unknown 



word has more than one parse it is counted under the 
appropriate group.^ The fourth and fifth columns in 
this table give the average parses per token and the 
initial precision assuming initial recall is 100%. 

We have disambiguated these texts using a rule 
base of about 500 hand-crafted rules. Most of the 
rule crafting was done using the general linguistic 
constraints and constraints that we derived from the 
first text, ARK. In this sense, this text is our "train- 
ing data" , while the other three texts were not con- 
sidered in rule crafting. 

Our results are summarized in Table |^. The last 
four columns in this table present results for differ- 
ent values for the parameter m mentioned above, 
m — I denoting the case when only the highest 
scoring parse(s) is (are) selected. The columns for 
TO < 1 are presented in order to emphasize that dras- 
tic loss of precision for those cases. Even atm — 0.95 
there is considerable loss of precision and going up 
to m = 1 causes a dramatic increase in precision 
without a significant loss in recall. It can be seen 
that we can attain very good recall and quite ac- 
ceptable precision with just voting constraint rules. 
Our experience is that we can in principle add highly 
specialized rules by covering a larger text base to 
improve our recall and precision for the m = 1. A 
post-mortem analysis has shown that cases that have 
been missed are mostly due to morphosyntactic de- 
pendencies that span a context much wider that 5 
tokens that we currently employ. 





Vote Range Selected(m) 


TEXT 


1.0 


0.95 


0.8 


0.6 


ARK 


Rec. 


98.05 


98.47 


98.69 


98.77 




Prec. 


94.13 


87.65 


84.41 


82.43 




Amb. 


1.042 


1.123 


1.169 


1.200 


HIST 


Rec. 


97.03 


97.65 


98.81 


97.01 




Prec. 


94.13 


87.10 


84.41 


82.29 




Amb. 


1.058 


1.121 


1.169 


1.189 


MAN 


Rec. 


97.03 


97.92 


97.81 


98.77 




Prec. 


91.05 


83.51 


79.85 


77.34 




Amb. 


1.068 


1.172 


1.237 


1.277 


EMB 


Rec. 


96.51 


97.48 


97.76 


97.94 




Prec. 


91.28 


84.36 


77.87 


75.79 




Amb. 


1.057 


1.150 


1.255 


1.292 



Table 2: Results with voting constraints 



®The reason for the (comparatively) high number of 
unknown words in MAN, is that tokens found in such 
texts, like flO, denoting a function key in the computer 
can not be parsed as a Turkish root word! 



4.1 Using root and contextual statistics 

We have employed two additional sources of infor- 
mation: root word usage statistics, and contextual 
statistics. We have statistics compiled from previ- 
ously disambiguated text, on root frequencies. After 
the application of constraints as described above, for 
tokens which are still ambiguous with ambiguity re- 
sulting from different root words, we discard parses 
if the frequencies of the root words for those parses 
are considerably lower than the frequency of the root 
of the highest scoring parse. The results after apply- 
ing this step on top of voting, with to = 1, are shown 
in the fourth column of Table ^ (labeled V+R) . 



TEXT 


V 


V-l-R 


V+R-HC 


ARK 


Rec. 


98.05 


97.60 


96.98 




Prec. 


94.13 


95.28 


96.19 




Amb. 


1.042 


1.024 


1.008 


HIST 


Rec. 


97.03 


96.52 


95.62 




Prec. 


94.13 


92.59 


94.33 




Amb. 


1.058 


1.042 


1.013 


MAN 


Rec. 


97.03 


96.47 


95.84 




Prec. 


91.05 


93.08 


94.47 




Amb. 


1.058 


1.042 


1.014 


EMB 


Rec. 


96.51 


96.47 


95.37 




Prec. 


91.28 


93.08 


94.45 




Amb. 


1.057 


1.036 


1.009 



Table 3: Results with voting constraints and root 
statistics, context statistics 

On top of this, we use the following heuristic us- 
ing context statistics to eliminate any further ambi- 
guities. For every remaining ambiguous token with 
unambiguous immediate left and right contexts (i.e., 
the tokens in the immediate left and right are unam- 
biguous), we perform the following, by ignoring the 
root/stem feature of the parses: 

1. For every ambiguous parse in such an unam- 
biguous context, we count how many times, this 
parse occurs unambiguously in exactly the same 
unambiguous context, in the rest of the text. 

2. We then choose the parse whose count is sub- 
stantially higher than the others. 

The results after applying this step on of the previ- 
ous two steps are shown in the last column of Table 
^ (labeled V-fR-|-C). One can see from the last three 
columns of this table, the impact of each of the steps. 

By ignoring root/stem features during this pro- 
cess, we essentially are considering just the top level 
inflectional information of the parses. This is very 



Text 


Sent. 


Tokens 


Parses/ 
Token 


Init. 
Prec. 


Distribution 
of 

Morphological Parses 





1 


2 


3 


4 


> 4 


ARK 


492 


7928 


1.823 


0.55 


0.15% 


49.34% 


30.93% 


9.19% 


8.46% 


1.93% 


HIST 


270 


5212 


1.797 


0.56 


0.02% 


50.63% 


30.68% 


8.62% 


8.36% 


1.69% 


MAN 


204 


2756 


1.840 


0.54 


0.65% 


49.01% 


31.70% 


6.37% 


8.91% 


3.36% 


EMB 


198 


5177 


1.914 


0.52 


0.09% 


43.94% 


34.58% 


9.60% 


9.46% 


2.33% 



Table 1: Statistics on Texts 



similar to Brill's use of contexts to induce transfor- 



mation rules for his tagger (Brill, 1992; Brill, 1995), 
but instead of generating transformation rules from 
a training text, we gather statistics and apply them 
to parses in the text being disambiguated. 

5 Efficient Implementation 
Techniques and Extensions 

The current implementation of the voting approach 
is meant to be a proof of concept implementation 
and is rather inefficient. However, the use of regular 
relations and finite state transducers (Kaplan and 



Kay, 1994) provide a very efficient implementation 



method. For this, we view the parses of the tokens 
making up a sentence as making up a acyclic a fi- 
nite state recognizer with the states marking word 
boundaries and the ambiguous interpretations of the 
tokens as the state transitions between states, the 



lected. The key point here is that due to the nature 
of the composition operator, the constraint transduc- 
ers can be composed off-line first, giving a single con- 
straint transducer and then this one is composed with 
every sentence transducer once (See Figure 

The idea of voting can further be extended to a 
path voting framework where rules vote on paths 
containing sequences of matching parses and the 
path from the start state to the final stated with 
the highest votes received, is then selected. This can 
be implemented again using finite state transducers 
as described above (except that path vote is appor- 
tioned equally to relevant parse votes), but instead 
of selecting highest scoring parses, one selects the 
path from the start state to one of the final states 
where the sum of the parse votes is maximum. We 
have recently completed a prototype implementation 
of this approach (in C) for English (Brown Corpus) 



rightmost node denoting the final state, as depicted 
in Figure ^ for a sentence with 5 tokens. In Figure |^, 
the transition labels are triples of the sort {wi,pj, 0) 
for the j*'' parse of token i, with the indicating 
the initial vote of the parse. The rules imposing 
constraints can also be represented as transducers 
which increment the votes of the matching transi- 
tion labels by an appropriate amount.]^ Such trans- 
ducers ignore and pass through unchanged, parses 
that they are not sensitive to. 

When a finite state recognizer corresponding to 
the input sentence (which actually may be consid- 
ered as an identity transducer) is composed with a 
constraint transducer, one gets a slightly modified 
version of the sentence transducer with possibly ad- 
ditional transitions and states, where the votes of 
some of the labels have been appropriately incre- 
mented. When the sentence transducer is composed 
with all the constraint transducers in sequence, all 
possible votes are cast and the final sentence trans- 
ducer reflects all the votes. The parse corresponding 
to each token with the highest vote can then be se- 



and have obtained quite similar results (Tiir, Of 



lazer, and Oz-kan, 1997) 



'^Suggested by Lauri Karttunen (private communica- 
tion) . 



6 Conclusions 

We have presented an approach to constraint-based 
morphological disambiguation which uses constraint 
voting as its primary mechanism for parse selec- 
tion and alleviates the rule developer from worrying 
about rule ordering issues. Our approach is quite 
general and is applicable to any language. Rules de- 
scribing language specific linguistic constraints vote 
on matching parses of tokens, and at the end, parses 
for every token receiving the highest tokens are se- 
lected. We have applied this approach to Turkish, 
a language with complex agglutinative word forms 
exhibiting morphological ambiguity phenomena not 
usually found in languages like English and have ob- 
tained quite promising results. The convenience of 
adding new rules in without worrying about where 
exactly it goes in terms of rule ordering (some- 
thing that hampered our progress in our earlier work 
on disambiguating Turkish morphology (Oflazer and 
Kuruoz, 1994 ; Oflazer and Tiir, 1996| ) ) , has also been 
a key positive point. Furthermore, it is also possible 
to use rules with negative votes to disallow impos- 



(wl,pl,0) (w2,pl,0) (w3,pl,0) (w4,pl,0) (w5,pl,0) 




(wl,p3,0) (w2,p5,0) (w3,p4,0) (w4,p3,0) (w5,p4,0) 

Figure 1: Sentence as a finite state recognizer. 



(wl,pl, 0) 



(w2,pl, 0) 



(w3,pl, 0) 



{w4,pl, 0) 



(w5,pl, 0) 




(wl,p3,0) 



(wl,pl,5) 



(wl,p3,4) 



(w2,p5,0) (w3,p4,0) 

i 



(w4,p3,0) 



Single transducer 
composed from all 
constraint 
transducers 



Constraint 
Transducer n {+Vn) 




(w5,p4, 0) 

" Composition of the 
sentence transducer 
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sible cases. This has been quite useful for our work 
on tagging EngUsh (Tiir, Oflazer, and Oz-kan, 1997) 
where such rules with negative weights were used to 
fine tune the behavior of the tagger in various prob- 
lematic cases. 

The proposed approach is also amenable to an 
efficient implementation by finite state transducers 
( Kaplan and Kay, 1994 ). By using finite state trans- 
ducers, it is furthermore possible to use a bit more 
expressive rule formalism including for instance the 
Kleene * operator so that one can use a much smaller 
set of rules to cover the same set of local linguistic 
phenomena. 

Our current and future work in this framework 
involves the learning of constraints and their votes 
from corpora, and combining learned and hand- 
crafted rules. 
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